Benchmark Structures and Conformational Landscapes of Amino Acids in the Gas Phase: A Joint Venture of Machine Learning, Quantum Chemistry, and Rotational Spectroscopy

The accurate characterization of prototypical bricks of life can strongly benefit from the integration of high resolution spectroscopy and quantum mechanical computations. We have selected a number of representative amino acids (glycine, alanine, serine, cysteine, threonine, aspartic acid and asparagine) to validate a new computational setup rooted in quantum-chemical computations of increasing accuracy guided by machine learning tools. Together with low-lying energy minima, the barriers ruling their interconversion are evaluated in order to unravel possible fast relaxation paths. Vibrational and thermal effects are also included in order to estimate relative free energies at the temperature of interest in the experiment. The spectroscopic parameters of all the most stable conformers predicted by this computational strategy, which do not have low-energy relaxation paths available, closely match those of the species detected in microwave experiments. Together with their intrinsic interest, these accurate results represent ideal benchmarks for more approximate methods.


INTRODUCTION
Thanks to its high resolution and noninvasivity, gas-phase molecular spectroscopy has become the method of choice to investigate the role of intrinsic stereoelectronic effects in tuning the physical-chemical properties of biomolecule building blocks. 1,2 In particular, the supersonic-jet expansion technique 3 coupled to laser ablation 4 is allowing the recording of gas-phase microwave (MW) spectra for these thermolabile compounds, which are usually characterized by high melting points. 5 However, the fast relaxation of some structures to more stable counterparts in the presence of low energy barriers can bias any direct thermochemical interpretation of the results provided by this technique. 6,7 Accurate quantum chemical (QC) computations can help to solve this kind of problem, 8,9 but the effective exploration of flat potential energy surfaces (PESs) and the characterization of their stationary points for medium-to large-size flexible systems are still challenging for at least two different reasons. From the one side, the size of the systems prevents a brute force approach employing very accurate but very expensive state-of-the-art QC methodologies. 10−12 From the other side, the very powerful local optimization techniques developed for semirigid systems are not effective for the exploration of rugged potential energy surfaces (PES) characterized by a huge number of energy minima possibly separated by low-energy barriers. 13,14 This situation calls for an integrated computational approach employing QC models of increasing accuracy in the different steps of an exploration/exploitation strategy guided by machine learning (ML) tools. 13,15−17 The effective strategy of this kind we have been developing in the past few years starts from a knowledge-based selection and constrained geometry optimizations of a limited number of conformers employing a fast semiempirical method. 14,18 Next, an effective exploration of the whole conformational PES is performed by the same semiempirical method guided by a purposely tailored evolutionary algorithm with the aim of finding other low-lying minima. 13 The results of this step are refined by hybrid and then double-hybrid density functionals, 19,20 and possible relaxation paths between pairs of adjacent energy minima are identified. 16 Once a panel of low-energy minima has been defined, accurate relative energies are computed by reducedscaling composite methods. 21−26 These results are integrated by zero point energies (ZPE) and thermal contributions to enthalpies and entropies employing anharmonic approaches rooted in the second order vibrational perturbation theory (VPT2) 27−34 and proper treatment of hindered rotations. 35,36 Finally, accurate spectroscopic parameters of the energy minima with nonnegligible populations under the experimental conditions of interest are computed. 37 In the specific case of rotational spectroscopy, improved equilibrium rotational constants are obtained by refining the optimized geometries by a linear regression approach. 20,38 Among the main biomolecule building blocks, natural αamino acids, which exist exclusively in neutral form in the gas phase, represent a particularly appealing playground because their rich conformational landscape is tuned by the competition among different kinds of intramolecular hydrogen bonds. At the same time, MW results are available for several conformers of most natural α-amino acids, 39−50 which represent very demanding benchmarks for the a priori prediction of structural and spectroscopic parameters. We have therefore selected the glycine and alanine prototypes together with a panel of α-amino acids with polar side chains (serine, threonine, cysteine, aspartic acid, and asparagine) with the aim of providing benchmark results allowing unbiased comparisons with experimental results. In fact, the current standards for the computation of MW parameters of biomolecule building blocks in the gas phase (see, e.g., refs 1, 6, 7, 39, and 51) employ QC methods of limited accuracy, pay marginal attention to the geometrical parameters, and neglect vibrational corrections. However, these limitations hamper any a priori prediction of the spectroscopic outcome, allowing at most its a posteriori interpretation in terms of the agreement between experimental and computed spectroscopic parameters for a predefined number of conformers.
Based on these premises, the goal of the present study is to improve and validate a general strategy able to find all the conformers detectable in supersonic jet expansions taking also into account fast relaxation processes possibly leading to the disappearance of some low-lying species. Unbiased comparison with spectroscopic results is made possible by the accuracy of the computational results, which will be shown to provide mean unsigned errors (MUEs) within 20 MHz for rotational constants and 10 cm −1 for both relative energies and vibrational frequencies (entering zero point energies and thermal contributions to thermodynamic functions). Together with their intrinsic interest of the studied molecules, these results will provide also a reference set for more approximate methods and/or search techniques.

PES EXPLORATION
The general strategy for the exploration of conformational PESs is based on a continuous perception of molecular structures performed by the PROXIMA software, 52 which is able to detect characteristic structural motifs and to separate soft (in the present context dihedral angles) and hard degrees of freedom. Then, a knowledge-based systematic search of soft degrees of freedom 14 can be optionally performed, which produces a panel of guess structures (e.g., the 3 n staggered conformers generated by rotations around n nonterminal single bonds, which are not a part of cycles). The geometries of these candidates are next optimized using the fast GFN2-XTB semiempirical method, 18 which has been selected because it tends to underestimate energy differences (i.e., to produce a too large set of candidates), which allows a safer use of energy thresholds for further processing. Next, a custom implementation of the island model evolutionary algorithm (IM-EA) 53 is employed to produce other candidates starting from an initial population (P 0 ) generated by the so-called Latin Hypercube stratified sampling 54 in order to maximize the diversity of soft degrees of freedom. The chemical descriptor (fitness) of each structure is the relative electronic energy obtained by GFN2-XTB geometry optimizations of the stiff degrees of freedom. Improved populations are then built iteratively for a given number of cycles by applying, with predetermined probability, different genetic operators, namely, crossover (interpolation of the features of different related structures for creating new ones), mutation (change of one or more soft degrees of freedom with some stochastic rule), and selection (high chance for high fitness structures of propagating their features in the next cycles). In the IM-EA, the different operators act separately on disjoint regions of the conformational landscape (islands), which are mixed only at predefined intervals by a dedicated operator (migration). Furthermore, some of the best structures found in each cycle are directly transferred to the next cycle (the so-called Hall of Fame). 55 All those choices are dictated by the high cost of evaluating the fitness of a new structure by constrained geometry optimizations. As a consequence, high fitness structures are worth being preserved in the population until some significantly improved structure is found. Typical values of the initial population, maximum Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article number of cycles, and number of islands are 100, 50, and 4, which result in about 1000 constrained geometry optimizations for each run of the algorithm. In order to further increase the coverage of the conformational space, 4 runs with different initial populations are performed for each molecular system. The full set of parameters employed in the IM-EA algorithm is given in Table S1 of the Supporting Information (SI), while further details are given in refs 13 and 17. Figure 1 shows a schematic flowchart of the current version of the whole algorithm, which is available under the GPL3 license at https://github.com/tuthmose/IM_EA. At the end of the whole exploration, low-energy conformers within a predefined energy range are selected from the panel of structures issued from IM-EA and, possibly, knowledge-based steps by eliminating too similar structures (in terms of rotational constants and root-mean-square deviations of heavy atom positions) and then performing single point energy evaluations at the B3LYP/jun-cc-pVDZ level, 56,57 also including Grimme's D3BJ dispersion corrections. 58 In the following, this computational model will be referred to simply as B3. The choice of the specific functional is not critical in this step because it is used only for the selection of an initial panel of structures to be next refined at higher levels. The B3 model has been selected because it is routinely employed in the interpretation of MW studies and, more importantly, provides reasonable anharmonic corrections (vide infra).
In the next step, structures lying within a smaller energy range are optimized at the same level, and the surviving ones define the panel of candidates for the final structural refinement, which is performed employing the revDSD-PBEP86-D3BJ/jun-cc-pv(T+d)Z model 59−61 (hereafter rDSD) for both geometry optimization and evaluation of harmonic force fields. 62 The rDSD functional has been selected because several studies have shown that it provides excellent geometrical structures, 38 dipole moments, 63 spectroscopic parameters, 37 noncovalent intermolecular interactions, 23,64 and conformational landscapes. 10,65,66 This composite strategy allows for strongly reducing the number of expensive geometry optimizations by hybrid and, especially, double-hybrid functionals without any loss of accuracy in the final results. The different energy thresholds depend on the system and the spectroscopic technique of interest. For the specific case of rotational spectroscopy, a conservative limit for the relative stability of detectable conformers is around 900 cm −1 (which corresponds to a relative population of about 1% at room temperature, where kT/hc = 207 cm −1 ). 1,16 As a consequence, the typical thresholds for the acceptance of semiempirical structures, B3 geometry optimizations, and final rDSD refinement are 2500, 1500, and 1000 cm −1 , respectively. These choices lead to about 100 B3 computations (including both single point and geometry optimizations) and no more than 20 rDSD geometry optimizations for each molecular system.
As mentioned in the Introduction, conformational relaxation can take place under the experimental conditions whenever the energy barriers ruling the interconversion are sufficiently low, with an upper limit of about 400 cm −1 being usually employed for discriminating in rotational spectroscopy of amino acids and related compounds. 6,7,67 With the aim of unraveling fast conformational relaxations, we always perform relaxed torsional scans at the rDSD level in order to obtain preliminary information on low-energy interconversion paths. Next, after precise location of transition states (TSs) by full geometry optimizations, their nature is checked by computing Hessian matrices.

RELATIVE STABILITIES AND SPECTROSCOPIC PARAMETERS
The typical MUEs of rDSD bond lengths (0.003 Å) and valence angles (0.003 radians, i.e., 0.15°) observed in the large SE100 database 38 are largely sufficient to obtain accurate relative electronic energies of different conformers by singlepoint energy evaluations using composite methods rooted in the coupled cluster (CC) ansatz. 68 In particular, the CC model including single, double, and perturbative estimate of triple excitations (CCSD(T)) 69 is considered the gold standard for this kind of computations provided that complete basis set (CBS) extrapolation and core valence (CV) correlation are taken into the proper account. The key idea of the reduced cost Cheap scheme (ChS) is that, starting from frozen core (fc) CCSD(T) computations in conjunction with the cc-pVTZ basis set, 57 CBS and CV terms can be computed with good accuracy and negligible additional cost employing second order Møller−Plesset perturbation theory (MP2). 70 Several benchmarks 22,23 have shown that improved noncovalent interactions can be obtained employing partially augmented (jun-cc-pV(n +d)Z) basis sets, 61,71 and the corresponding model is labeled junChS. Replacement of conventional methods with the explicitly correlated (F12) variants leads to the junChSF12 model, which is even more accurate without any excessive additional cost. In detail, the starting point is the frozen-core (fc) CCSD(T)-F12b(3C/FIX) method 72−74 again in conjunction with the jun-cc-pV(T+d)Z basis set. 61,75 The corresponding auxiliary basis sets are also employed for resolution of the identity and density fitting, and the geminal exponent (γ) was fixed to 1.0 a 0 −1 . 75,76 CBS extrapolation is carried out with the standard n −3 two-point formula 77 employing MP2F12/jun-cc-pV(X+d)Z energies with X = T and Q. The CV contribution is then incorporated as the difference between all-electron (ae) and fc MP2F12 calculations, both with the cc-pCVW(T+d)Z basis set. 78 A systematic study of noncovalent intermolecular interactions 23 showed that the junChSF12 approach is affected by small basis set superposition errors (BSSE), which would be difficult to take into account for intramolecular interactions. Furthermore, comparison with the most accurate results available for a panel of representative noncovalent complexes provided an average absolute error smaller than 10 cm −1 . 22,23 To determine the relative stability of different low-energy minima, one has to move from electronic energy differences to the corresponding relative enthalpies at 0K (ΔH 0°) or free energies (ΔG°) at a temperature depending on the experimental conditions. The vibrational contributions to thermodynamic functions are usually computed by the harmonic oscillator (HO) model, which shows the largest errors in the high frequency (overestimated contributions to zero point energies) and low frequency (overestimated contributions to entropies) regions.
The first issue is solved in the present work by estimating anharmonic contributions in the framework of second order vibrational perturbation theory (VPT2), which provides analytical and resonance free expressions for the ZPEs. 79 Harmonic (rDSD) and anharmonic (B3) contributions are employed in this connection, since a recent benchmark study has shown that for semirigid molecules the average absolute error of zero point energies with respect to accurate experimental results is reduced from 53 to 17 cm −1 when going from the HO to the VPT2 anharmonic model. 80 The treatment of low frequency contributions (typically less than 100 cm −1 ) is more involved because different modes (e.g., torsions, inversions, etc.) need be identified, characterized, and treated by proper variational anharmonic computations. 36,81 In the same benchmark study mentioned above in connection with anharmonic ZPEs, 80 it has been shown that the simple one-dimensional hindered rotor model proposed by Ayala 82 in conjunction with the VPT2 model for the other vibrational modes leads to remarkably accurate vibrational entropies for both semirigid and flexible molecules for which accurate experimental results are available. In particular, an average absolute error of 3 cm −1 is obtained for the TΔS contribution to free energies at room temperature. In the present context, test computations showed that the unbiased detection of hindered rotations becomes ambiguous for some conformers, so that we prefer to resort to the much simpler and black-box quasi-harmonic (QH) approximation. 35,83 In the QH approach, below a given cutoff value, entropic terms are obtained from the free-rotor model, and a damping function is used to interpolate between free-rotor and harmonic oscillator expressions close to the cutoff frequency.
The leading terms of MW spectra are the rotational constants of the vibrational ground-state (B 0 i , where i refers to the inertial axes a, b, c), which include vibrational corrections (ΔB vib i ) in addition to equilibrium rotational constants (B e i ). 84 In the framework of the VPT2 approximation, 85 the ground-state rotational constants can be expressed as (1) where the α r 's are the vibration−rotation interaction constants and the sum runs over all r vibrational modes. Noted is that the evaluation of the α r 's implies anharmonic force field calculations and that the sum appearing in eq 1 (contrary to individual terms) does not involve any resonance issue at the VPT2 level (for details, see, e.g., refs 11, 86, 87). ΔB vib i being a small fraction of the corresponding B e i (typically 0.5%), 88 it can be determined at an affordable level of theory (B3 in the present context) without significantly affecting the accuracy of the resulting vibrational ground-state rotational constant. 11,89 At the same time, inclusion of vibrational corrections is not warranted if the errors on the computed rotational constants are not much lower than 1% (50 MHz for a constant of 5000 MHz). Therefore, equilibrium rotational constants require very accurate geometrical parameters, which can be obtained only with state-of-the-art composite methods incorporating high excitation orders in the correlation treatment. These methods are able to deliver errors on equilibrium rotational constants as low as 0.1% (5 MHz for a rotational constant of 5000 MHz). 90 The reduced cost junChSF12 composite method delivers typical relative errors of 0.2%, 11,80,91 which are still sufficient for the unequivocal prediction and assignment of different conformers in the MW spectra of flexible molecules. Higher relative errors (typically 0.4−0.5%) are obtained at the rDSD level. However, the systematic nature of the errors permits geometrical parameters to be obtained and, thus, equilibrium rotational constants, rivaling the accuracy of the jun-ChSF12 counterparts by the linear regression approach (LRA). In this model, the computed geometrical parameters (r comp ) are corrected for systematic errors by means of scaling factors (a) and offset values (b) depending on the nature of the involved atoms and determined once for ever from a large database of accurate semiexperimental (SE) equilibrium geometries: 38,92 (2) The a and b values for different bonds and valence angles are taken from ref 38. Noted is that the intrinsic accuracy of the rDSD model leads in most case to b = 0.0 together with very small a values for bond lengths and that, among valence angles, only OCO and HCH need be corrected. Several studies have confirmed that very accurate molecular structures can be obtained employing this approach (referred to in the following as rDSD-LRA). 16  Additional parameters of particular relevance for MW spectroscopy are the nuclear quadrupole coupling constants (χ ii , i referring to the inertia axis a, b, or c). 94 Nuclear quadrupole coupling is the interaction between the quadrupole moment of a nucleus with nuclear spin I ≥ 1) and the electric gradient at the nucleus itself. 86 Since at least one 14 N quadrupolar nucleus is present in all amino acids, nuclear quadrupole coupling constants are important for accurate predictions of rotational spectra because they determine a splitting of the rotational transitions, which generates the socalled hypefine structure. Since a systematic study of rDSD quadrupole coupling constants has not yet been performed, the comparison with the experimental values for several conformers of different amino acids represents per se an interesting benchmark. We anticipate that vibrational effects on nuclear quadrupole coupling constants are usually smaller than the uncertainty affecting the computed equilibrium values, and thus they have not been considered in this work.
Finally, the components of dipole moments determine the intensities of rotational transitions and, as already mentioned, rDSD is expected to provide reliable values. 63 Concerning technical details, the Gaussian package 95 has been used for all calculations except the junChSF12 and QH ones, which have been performed with the help of the Molpro 76 and GoodVibes 83 software, respectively.

STRUCTURE AND SOFT DEGREES OF FREEDOM
The conformation of isolated amino acids is determined by and ω = C α −C′−O−H) and side chain (χ, defined more precisely in the following) torsional angles, as shown in the central panel of Figure 2. However, the nonplanarity of the NH 2 moiety suggests replacing the customary ϕ dihedral angle where LP is the nitrogen lone-pair perpendicular to the plane defined by the two aminic hydrogens and the C α atom.
For purposes of consistency with the original experimental studies, capital letters L, M, N, ... are used in some cases to label conformers of amino acids with polar side chains in order of decreasing relative populations estimated from MW spectra. 39,45,46,96

The Smallest Prototypes: Glycine and Alanine.
Glycine has been extensively characterized from both experimental and computational points of view (see refs 97−101 and references therein). Its limited size allowed the exploitation of state-of-the-art composite schemes including, together with CBS and CV contributions evaluated at the CCSD(T) level, also full account of triple excitations, perturbative inclusion of quadruple excitations, and relativistic contributions (CBS+CV+fT+pQ+rel). 101 All the eight conformers mentioned above (I, II, III, I′, III′, Ic, IIIc, and I′c) have been characterized with four of them (I, III, Ic, and IIIc) having a planar backbone (C s point group) and the other four (labeled with an asterisk to signal the presence of two equivalent nonplanar backbones) lacking any symmetry 98,99 (see Table S2 of the SI). Concerning relative stabilities, the junChSF12 model performs remarkably well with an average absolute error of 6 cm −1 from the most accurate available results 101 (see Table 1). The largest discrepancy (13 cm −1 ) is observed for the II conformer, which is slightly stabilized by triple and quadruple excitations. Also the accuracy of the rDSD model (maximum error (MAX) and MUE of 29 and 15 cm −1 with respect to the most accurate available results) is largely sufficient for most purposes and gives further support to the use of this computational level for geometry optimizations and harmonic frequency evaluations.
Zero point and thermal contributions have a nonnegligible effect, leading to a significant destabilization of structure II and a strong stabilization of structure III (see Table 1). Inclusion of anharmonic contributions in ZPEs is needed for obtaining quantitative results but does not alter the stability order of the different conformers. Finally, the main effect of the QH corrections is to reduce the overstabilization of structure III produced by the harmonic oscillator model (see Table 1).
A shorter N···O distance in the II form with respect to I parallels the greater strength of the OH···N hydrogen bond with respect to its NH···O counterpart. Despite these relative hydrogen-bond strengths, the I conformer is more stable than II by about 230 cm −1 due to the more favorable (ω = 180°T able 1. Relative Electronic Energies (ΔE), Enthalpies at 0 K (ΔH 0°= Δ(E+ZPE)), and Free Energies at Room Temperature (ΔG°) (all in cm −1 ; 1 kJ/mol = 83.59 cm −1 ) for the Glycine Conformers versus ω = 0°) arrangement of the carboxylic group in the I form. The role of the arrangement of the carboxylic group is confirmed by the nearly constant destabilization of the Ic and I′c forms with respect to their I and I′ counterparts (1690 cm −1 for Ic vs I and 1687 cm −1 for I′c vs I′). At the same time, the reduced stability of the III form with respect to I (about 600 cm −1 ) is related to the lower strength of the bifurcated NH 2 ···O(H) hydrogen bond with respect to its NH 2 ···O(�C) counterpart for identical arrangements of the carboxylic moiety. Finally conformers I′ and III′ are less stable than their I and III counterparts (by 430 and 330 cm −1 , respectively) because a bifurcated hydrogen bond is replaced by a more conventional single hydrogen bond. This trend could change in the presence of polar side chains because it allows the formation of additional backbone (side chain) hydrogen bonds (vide infra).
Computation of energy barriers ruling the interconversion between pairs of adjacent conformers shows that structures III and I′ relax easily to structure I (with energy barriers of about 250 and 70 cm −1 , respectively), whereas structure I′c relaxes to structure Ic (with an energy barrier of about 25 cm −1 ). Furthermore, the relative stability of structures III′ (927 cm −1 ), Ic (1679 cm −1 ), and IIIc (2043 cm −1 ) are too low to permit their unequivocal characterization by MW spectroscopy. We are thus left with only two conformers (I and II), which could be (and have actually been) detected in MW experiments. 40 The availability of the experimental rotational constants for several isotopic species allowed the determination of very accurate semiexperimental equilibrium structures. 102 For the I conformer, the MAX and MUE of rDSD geometrical parameters with respect to their semiexperimental counterparts are 0.0049 and 0.0019 Å for bond lengths and 0.46 and 0.15°f or valence angles. The rDSD-LRA model does not change the situation for valence angles but reduces the errors of bond lengths by about five times (0.0008 and 0.0004), reaching the accuracy of state-of-the-art composite methods. 11,102 More generally, all the computed spectroscopic parameters of the I and II conformers are in remarkable agreement with their experimental counterparts 40 (see Table 2), with MAX and MUE of 30.2 and 13.6 MHz for rotational constants, 0.23 and 0.13 MHz for quadrupole coupling constants, and 0.1 and 0.05 D for dipole moment components. The errors for rotational constants and quadrupole coupling constants are close to those delivered by the ChS composite method (MAX and MUE of and 60.8 and 16.5 MHz for rotational constants 0.19 and 0.10 for quadrupole coupling constants). 99 These results confirm that junChSF12 relative energies, rDSD-LRA structural parameters, and rDSD spectroscopic parameters can be confidently used for the comparison with experiments and represent reliable benchmarks for less refined quantum chemical methods.  Moving to alanine, 41,103−108 the two sides of the average backbone plane are no longer equivalent, with two nearly isoenergetic minima (corresponding to positive or negative values of the ψ dihedral angle) being expected at least for structures of II, I′, III′, and I′c type. The number of conformers thus increases to 12, but unconstrained geometry optimizations lead also to a splitting of structure III into III and III − , although the energy difference is so tiny that an effective planar structure is expected. In all the energy minima the methyl group is found in a staggered position with respect to the substituents at C α with rotational barriers of about 1200 cm −1 , close to the value of 1140 cm −1 obtained for ethane at a comparable computational level. 109 The MAX and MUE of rDSD computations with respect to the junChSF12 reference (42.1 and 15.7 cm −1 ) are more than five times smaller than the corresponding B3 values (222.2 and 91.6 cm −1 ) and less than half the corresponding MP2 values (96.5 and 28.8 cm −1 ). What is even more important, junChSF12 and rDSD provide the same stability order, whereas B3 and MP2 computations overestimate the stability of type II conformers (see Table S3 of the SI).
As already mentioned, the comparison with experiment requires the computation of the relative free energies for the different conformers at the temperature of the carrier gas (in order to evaluate their population) and of transition states ruling their interconversion. The results collected in Table 3 show that all the conformers involving ω values around 0°(Ic, III − c, I′ − c, and I′c) are too unstable to permit their unequivocal detection in MW experiments. Furthermore, relaxation of I′ and III′ conformers to their more stable I and III counterparts is ruled by low energy barriers, which are easily overcome in the typical conditions of supersonic-jet expansion. Low energy barriers govern also the relaxation of III to I and II to II − conformers. As a consequence, only the I and II − conformers could be detected in MW studies, with the former collecting the populations of I, III − , III, I′, I′ − , III′, and III′ − conformers and the latter those of the II and II − conformers. It is remarkable that the relative population of conformer I computed at room temperature from the free energies collected in Table 1 (76%) is in good agreement with the experimental estimate (80%), 41 whereas a significantly lower relative population (54%) would have been predicted neglecting zero point and thermal effects. Table 4 collects the experimental and computed rotational parameters for the I and II − conformers. A remarkable agreement is noted with the MAX and MUE of rDSD-LRA/ B3 rotational constants (36.1 and 10.5 MHz) being even better than those (50.5 and 12.2 MHz) obtained at the much more expensive CCSD(T)/cc-pVTZ level. 106 It is noteworthy that for both conformers of alanine the error on the B 0 rotational constant is much higher than those affecting the other two rotational constants, whereas in both the observed conformers of glycine the largest error was found for A 0 .
The geometrical parameter most sensitive to conformational changes is the NC α C′ valence angle, which decreases by about 3.5°when going from the I to the II − conformer, consistent with the trans-angle rule of hyperconjugative and steric effects. 110 At the same time, the C�O bond length shows the expected lengthening by about 0.002−0.003 Å when going from free (structure II − ) to hydrogen-bonded (structure I) forms.
The only significant differences between the geometrical parameters of glycine and those of alanine concerns the C α −C′ bond length (shorter in glycine by about 0.007 Å for both conformers) and the NC α C′ valence angle (narrower in glycine by about 2°for both conformers). Therefore, the main structural differences between glycine and alanine are highly localized at the C α . As already mentioned, the ψ torsional angle characterizes the backbone deviation from planarity (see Tables S2 and S3 of the SI). For I conformers, it is exactly equal to 180°in glycine, whereas the lack of any symmetry induces a change of more than 15°in alanine. On the other hand, comparable ψ values are observed for the II forms of glycine and alanine (12°and 15°, respectively).

Amino Acids with Polar Side Chains.
Systematic investigations have revealed that, in analogy with alanine, the natural amino acids containing simple nonpolar side chains (valine, 42 isoleucine, 43 and leucine 44 ) present two dominant conformers of types I and II, respectively. On the other hand, the conformational landscape of natural amino acids with polar side chains is much richer due to the synergy or competition between intrabackbone and backbone (side chain) hydrogen bonds.
Let us start our discussion from serine (Ser), which has two soft degrees of freedom in its CH 2 OH side chain (χ 1 = N− C α −C β −O and χ 2 = C α −C β −O−H), with the OH moiety able to act either as donor or acceptor in quite strong intramolecular hydrogen bonds. 111 The increased number of soft degrees of freedom (from 3 to 5) makes this system suitable for applying the PES exploration strategy introduced in the previous sections, which produces 12 low-energy conformers (see Table 5).
However, the IIg − g − conformer relaxes to the more stable IIg − t form through rotation around χ 2 ; IIIg − g relaxes to Ig − g through rotation around ψ; the less stable IIg − t conformer relaxes to its more stable counterpart through a planar structure (invert ϕ′, ψ, and ω); Igt relaxes to I′gg − through rotation around χ 2 , and Igg relaxes to III′gg through rotation around ψ. We are thus left with seven conformers possibly detectable in MW experiments: three of type II, two of type III′, and one each for types I and I′ (see Figure 3).
All the most stable conformers are stabilized by both intrabackbone and backbone (side chain) hydrogen bonds (see Figure 3). Furthermore, contrary to III conformers, III′ structures are locked in sufficiently deep wells to become detectable by one HNH···OH (III′gg) or OH···O�C (III′tg − ) hydrogen bond between the backbone and the side chain in addition to the intrabackbone HNH···OH hydrogen bond.
ZPEs and thermal contributions alter the ordering of the four most stable conformers stabilizing, as usual, structures of  Best estimates of relative free energies at room temperature (ΔG°in cm −1 ) and dihedral angles optimized at the rDSD level (ϕ′, ψ, ω,    45 According to both theory and experiments, the first four conformers (one of type I, one of type I′, and two of type II) are significantly more stable than the two conformers of type III′ and a further conformer of type II, which have, in turn, comparable stability (see Table 6). The rotational constants of the two most stable conformers have been recently computed by geometry optimizations at the ChS level, reaching MAX and MUE of 28.7 and 10.6 MHz, respectively. 23,112 It is noteworthy that even smaller MAX and MUE (17.7 and 8.1 MHz, respectively) are obtained at the rDSD-LRA level, whose strongly reduced cost has allowed us to compute the spectroscopic parameters of all the other lowenergy conformers. The remarkable agreement between computed and experimental results for all the detected conformers of serine confirms the accuracy of our computational strategy. The next studied system is threonine (Thr), 113 in which a methyl group replaces one of the hydrogen atoms bonded to C β , leading to the CHCH 3 OH side chain which has again two soft degrees of freedom (χ 1 = N−C α −C β −O and χ 2 = C α −C β − O−H) since the terminal methyl group is frozen in a staggered conformation with an estimated rotation barrier of 1400 cm −1 . There is now a second chiral center in addition to the C α atom, with the natural amino acid being 2S,3R-threonine. The conformational landscape of threonine has been investigated in two different studies, 113,114 which obtained 71 and 56 conformers, respectively, in a range of about 4000 cm −1 , but the final set of conformers was the same up to a relative energy of 1600 cm −1 . The knowledge-based step of our conformational exploration started from the 12 low-energy conformers of serine collected in Table 5, each of them being then split into two nonequivalent structures. Next, the IM-EA algorithm was used to generate additional low-energy minima. At the end of these two steps and the subsequent filtering/refinement we are left with the 10 low-energy conformers (within an energy range of 1000 cm −1 ) collected in Table 7. It is noteworthy that this finding is in full agreement with ref 114.
The predicted population of conformer IIg − g − is too low to allow its detection in MW experiments, and conformers IIgt and Igt relax easily to conformers IIgg and Ig − g, respectively.
We are thus left with the same number (seven) and backbone conformation (three conformers of type II, two of type III′, and one each for types I and I′) of the structures discussed above for serine, which should be (and have actually been 46 ) detected in MW experiments. However, the presence of the β methyl group increases the energy barrier governing relaxation of the III′g − g conformer to its Ig − g counterpart from about 200 to about 800 cm −1 when going from serine to threonine. As a consequence, the III′g − g conformer is observed in threonine in place of the less stable III′tg − conformer observed in serine (see Figure 4). At the same time, a general destabilization of all conformers with respect to IIgg accompanies the substitution of a β hydrogen atom with a methyl group (see Figure 5).
The two most stable (IIgg and Ig − g) and the three least stable (III′g − g, III′gg, and IIg − t) conformers are the same in terms of electronic energies, enthalpies, or free energies. The relative ordering of the two intermediate conformers is, instead, altered by both ZPE and thermal contributions.
All the spectroscopic parameters of the seven low-energy conformers of threonine detected in a recent microwave study 46 show a remarkable agreement with those computed for the most stable conformers predicted by our computations (see Table 8). The relative stability order estimated from the experimental results is Ig − g > IIgg > I′gg − > IIg − t ≈ III′g − g ≈ IItg − ≈ III′gg, which is in general agreement with the computed relative free energies except for the inversion between Ig − g and IIgg conformers and the position of the IIg − t structure.
Replacement of the oxygen atom in the side chain of serine by a sulfur produces cysteine (Cys), whose CH 2 SH side chain has again two soft degrees of freedom (χ 1 = N−C α −C β −S and χ 2 = C α −C β −S−H). One might think that the same Best estimates of relative free energies at room temperature (ΔG°in cm −1 ) and dihedral angles optimized at the rDSD level (ϕ′, ψ, ω,  conformers should be detected for cysteine and serine. However, the strengths of the H-bonds possibly formed by the thiol group are weaker than those of its alcohol counterpart. Therefore, it is expected that the barriers separating low-lying conformers decrease and in some instances may even disappear. In ref 115, a systematic scan of the conformational PES at the MP2/cc-pVTZ level led to the identification of 71 unique conformers, thus defining a reference data set. The knowledge-based step of our PES exploration involved the 12 low-energy conformers found for serine and integration of these structures with those issued from the IM-EA exploration employing sufficiently high energy thresholds allowed us to retrieve all the structures of the reference data set. 115 Then, refinement of the results by the usual energy tresholds led to the 9 conformers collected in Table 9. The rDSD results are once again in very good agreement with their junChSF12 counterparts (MAX and MUE of 44 and 24 cm −1 , respectively). Among those nine conformers, the two least stable ones have too low populations to allow their unequivocal experimental characterization and the conformer I′gg − relaxes easily to its Ig − g counterpart, which has a similar shape. Therefore, the number of detectable conformers reduces to 6: two each for types I, II, and III′ (see Figure 6). The backbone structure of the most stable conformer and the general trends are similar to those discussed above for serine and threonine (see Figure 5), but the conformers Igg and IIg − g − replace the I′gg − and IIg − t counterparts observed in both serine and threonine.
The spectroscopic parameters computed at the rDSD level are in remarkable agreement with their experimental counterparts 39 with MUEs of 11.7, 5.7, and 3.1 MHz for the A 0 , B 0 , and C 0 rotational constants, respectively (Table 10). The errors on B 0 and C 0 are quite low already at the rDSD level (see Table S9 of the SI), whereas errors as large as 40 MHz are obtained for the A 0 rotational constant. For most conformers, Figure 5. Observed conformers of threonine, serine, and cysteine. The relative free energies at room temperature (ΔG in cm −1 , see text for details) are given for each amino acid with respect to its IIgg conformer. The relations between the observed conformers of the three amino acids are highlighted with dashed lines. The conformer I′gg − of cysteine, which has not been detected in MW studies, is reported with orange labels. The computed relative free energies at room temperature (ΔG 0 in cm −1 ) are also reported. b Computed data are at the rDSD level (including LRA corrections for equilibrium rotational constants) except for electronic energies (junChSF12) and vibrational corrections to equilibrium rotational constants (B3). c Standard errors are shown in parentheses in units of the last digits.

Journal of Chemical Theory and Computation
pubs.acs.org/JCTC Article the C−S bond is nearly perpendicular to the average backbone direction (see Figure 6) and is, in turn, roughly aligned with the a axis. As a consequence, any overestimation of the C−S bond length results in a nonnegligible underestimation of the A 0 rotational constant. In this connection, the LRA correction brings the computed values in remarkable agreement with experiment (the maximum error is obtained for the Ig − g conformer and amounts to 18 MHz, i.e., 0.4%).
Let us now analyze aspartic acid, the simplest amino acid containing two carboxylic groups. The CH 2 COOH side chain has three dihedral angles ( . However, χ 3 is frozen in trans (favored and not explicitly labeled in the following) or cis (labeled by c in the following) conformations. A recent systematic analysis of the conformational landscape 116 identified 19 energy minima in a range of 3500 cm −1 , and we were able to locate all those minima by our general exploration strategy with enlarged energy thresholds. Within this panel of candidates, only 9 conformers have electronic energies lying within 1000 cm −1 above the absolute energy minimum (see Table 11). Once again, a good quantitative agreement is observed between junChSF12 and rDSD results Best estimates of relative free energies at room temperature (ΔG°in cm −1 ) and dihedral angles optimized at the rDSD level (ϕ′, ψ, ω, χ 1 = N− C α −C β −S, and χ 2 = C α −C β −S−H in degrees) are also given. See main text for details. b Sum of columns 2, 3, 4, 5, 6, and 7. c Relaxes to Ig − g.  The computed relative free energies at room temperature (ΔG 0 in cm −1 ) are also reported. b Computed data are at the rDSD level (including LRA corrections for equilibrium rotational constants) except for electronic energies (junChSF12) and vibrational corrections to equilibrium rotational constants (B3). c Standard errors are shown in parentheses in units of the last digits.

Journal of Chemical Theory and Computation
pubs.acs.org/JCTC Article with the MAX and MUE between the two methods being 83.5 and 30.2 cm −1 without any inversion in the relative stability order. Inclusion of zero point and thermal effects produces significant changes in the trend issued from relative electronic energies with the most striking effect being, as usual, the destabilization of all the conformers showing type II hydrogen bridges (see Table 11). The six most populated conformers shown in Figure 7 are significantly more stable than the next 3 ones, and exactly six species were detected in MW experiments. 96 Both the I′g − t and III′gt conformers are more stable than their I and III counterparts due to the replacement of an intrabackbone bifurcated NH 2 ···O�C or NH 2 ···OH hydrogen bond by a single HNH···O�C or HNH···OH hydrogen bond plus a single HNH···O�C backbone (side chain) hydrogen bond. The increased stability explains also the absence of lowbarrier relaxation paths from these conformers to I structures.
The spectroscopic parameters collected in Table 12 show a remarkable agreement between theory and experiment. It is noteworthy that previous MP2/6-311++G(d,p) 96 computations forecasted that one or two different conformers should be experimentally detected and that the spectroscopic constants obtained at that level show MAX and MUE with respect to experiment (29.2 and 10.6 MHz) more than three times larger than their rDSD-LRA counterparts (8.2 and 3.1 MHz). The rDSD MUE (smaller than 0.2%) approaches again the accuracy of state-of-the-art composite methods for small semirigid molecules 117 and permits the unbiased assignment of MW spectra. 118 The stability order of the six most populated conformers is, however, quite different between theory and experimental estimates with the strongest discrepancy concerning the inversion of the relative stability of I and II species. Although the experimental populations take into account also possible relaxation of higher-energy structures to the most stable conformers, according to the computed free energies the initial populations of all the species outside the six most stable ones are too low to alter the computed relative populations. From another point of view, the experimental estimates are based on a number of assumptions, which might not be fulfilled in the present case. Also taking these considerations in mind, the agreement between theory and experiment concerning the nature and spectroscopic parameters of all the observable species remains remarkable.
The last system considered in this study is asparagine, which is the only proteinogenic α-amino acid, together with glutamine, 119 containing an amide group. The soft degrees of freedom of the asparagine side chain (CH 2 CONH 2 ) include two dihedral angles (χ 1 = N−C α −C β −C γ , χ 2 = C α −C β −C γ −N) because the coupled rotation/inversion displacements of the NH 2 amide moiety from the planar reference structure can be safely added to the panel of stiff degrees of freedom. The amide moiety can act either as a proton donor or as a proton acceptor, with this increasing the number of possible backbone (side chain) intramolecular hydrogen bonds. Asparagine in the gas-phase has been widely studied by both computational 47,120 and experimental 47,121 points of view, but a comprehensive characterization of its structure and conformational landscape has not yet been performed by state-of-the-art quantum chemical methods.
The usual exploration/refinement strategy provides 5 conformers with rDSD electronic energies within a little more than 1000 cm −1 above the absolute energy minimum (see Table 13). At this level only the most stable IIgg conformer (see Figure 8) should be detectable in MW experiments. The situation is thus very different from that found in the case of aspartic acid because the presence of the NH 2 amidic moiety in the side chain permits the compensation of the weak hydrogen bond in the carboxylic moiety (lacking in II structures of aspartic acid with respect to their I Sum of columns 2, 3, 4, 5, 6, and 7. a Best estimates of relative free energies at room temperature (ΔG°in cm −1 ) and dihedral angles optimized at the rDSD level (ϕ′, ψ, ω,  In fact, an analogous situation would involve a 180°rotation of the OH moiety in the carboxylic group of the side chain in aspartic acid away from its most stable arrangement. ZPE and thermal contributions strongly stabilize all conformers with respect to the most stable IIgg strucure, so that the IIg − t form (see Figure 8) might become accessible to experimental characterization. As a matter of fact, several searches of transition states connecting IIg − t and IIgg conformers gave quite high energy barriers preventing any effective relaxation path. As a consequence, there is a disagreement between theory and experiment 47 about the number of low-lying conformers of asparagine. However, comparison between computed and experimental spectroscopic parameters for the single conformer detected in the MW study of ref 47 shows the usual remarkable agreement (see Table 14) with MAX and MUE as low as 12.6 and 4.7 MHz for rotational constants and 0.19 and 0.08 MHz for quadrupole coupling constants.

Trends of Intramolecular Interactions.
The accurate results obtained for several amino acids permit the strengths of the main interactions governing the conformational landscapes of these flexible systems to be estimated. In particular, approximate values for the strengths of different hydrogen bonds can be computed from prototypical systems and used to rationalize energy differences among the conformers of different amino acids in terms of sums of stabilizations from near-atom interactions. Based on the energy difference between Ic and I or I′c and I′ conformers of glycine, for each carboxyl group ω = 180°is more stable than ω = 0°b y about 1700 cm −1 and the same applies to χ 3 in the case of aspartic acid. Concerning other situations, the hydrogen bond donors can be ranked in the order O−H > N−H > S−H, and the hydrogen bond acceptors in the order N > O > S. As a The computed relative free energies at room temperature (ΔG 0 in cm −1 ) are also reported. b Computed data are at the rDSD level (including LRA corrections for equilibrium rotational constants) except for electronic energies (junChSF12) and vibrational corrections to equilibrium rotational constants (B3). c Standard errors are shown in parentheses in units of the last digits. Best estimates of Gibbs free energies (ΔG°in cm −1 ) and dihedral angles optimized at the rDSD level (ϕ′, ψ, ω, χ 1 = N−C α −C β −C γ , and χ 2 = C α − C β −C γ −N in degrees) are also given. The χ 3 angle (C β −C γ −N−H) is always close to 0°. See main text for details. b Sum of columns 2, 3, 4, 5, 6, and 7.  On the other hand, in serine and threonine, conformer II becomes more stable due to the extra stabilization related to an O−H··· O�C hydrogen bond involving the backbone and the side chain. The same occurs in cysteine, where the 800 cm −1 gained from the S−H···O�C hydrogen bond makes the IIgg conformer more stable than the Igg counterpart by about 600 cm −1 . An analogous situation is found in aspartic acid, where the amine moiety is involved at the same time in an OH···N hydrogen bond within the backbone and a HNH··· O�C hydrogen bond with the side chain. Finally, IIgg is by far the most stable conformer in asparagine because the presence of an amide group allows the formation of two additional backbone (side chain) hydrogen bonds. Type III conformers are intrinsically less stable than their I counterparts (due to the lower strength of NH 2 ···O−H with respect to NH 2 ···O�C hydrogen bond), and moreover, they can easily relax to I forms through rotation around ψ when not locked by additional interactions. However, III′ conformers featuring a single H− N−H···O−H hydrogen bond can be stabilized and locked into sufficiently deep energy wells upon involvement of the released N−H bond into additional hydrogen bonds with the side chain. This is the case, for instance, of the III′gg conformer in serine, threonine (H−O···H−N−H···O−H), and cysteine (H−O···H−N−H···S−H). Hydrogen bonding is surely the driving force ruling the general trends of structures and relative stabilities, but the detailed geometry and energy changes between conformers depend strongly on other stereoelectronic effects like, e.g., hyperconjugation or steric repulsion. For instance, any additive picture based on individual hydrogen bond strengths is tuned by the preference of bulky vicinal substituents for trans or gauche conformations, which, in turn, depends on the balance between electrostatic, steric, and hyperconjugative effects. Furthermore, vibrational effects (affecting both ZPEs and entropic contributions) alter the stability order provided by relative electronic energies and must be taken into the proper account.
While the reader is referred to studies of specific systems for more detailed analyses along these lines, 46,47,96,111,115 we point out that only the availability of accurate results including all the stereoelectronic and vibrational effects (like those reported in the present paper) can provide an unbiased reference for building more realistic models (e.g., force fields including non additive terms) for the study of flexible biomolecules.

CONCLUDING REMARKS
In this paper, a general strategy aimed at the unbiased disentanglement of the conformational bath of flexible biomolecule building blocks in the gas phase has been further improved and validated for the specific case of representative natural α-amino acids. The use of curvilinear internal coordinates permits the separation between stiff and soft degrees of freedom. Then, effective exploration of the soft variables can be performed by purposely tailored evolutionary algorithms, whose fitness scores are obtained by constrained geometry optimizations of the stiff degrees of freedom employing a fast semiempirical method. Refinement of the energies and structures by a hybrid and then a last-generation double-hybrid functional allows very reliable results to be obtained minimizing the number of expensive computations. Application of the procedure to supersonic jet experiments requires also the location of transition states ruling the interconversion between pairs of adjacent energy minima and the identification of fast relaxation processes. Improved structures and relative energies are obtained by the rDSD-LRA approach and the junChSF12 composite method, respectively. Finally, the spectroscopic parameters of sufficiently populated conformers can be safely computed at the rDSD level.
The results obtained for glycine, alanine, and, especially, different natural α-amino acids with polar side chains are in full agreement with the available spectroscopic data and permit their unbiased interpretation in terms of the cooperation or competition between intrabackbone and backbone (side chain) hydrogen bonds.
Together with the intrinsic interest of the studied molecules, the results of the present investigation show that highly reliable analysis of the conformational landscape is today possible for flexible building blocks of biomolecules in the gas phase. Furthermore, we provide benchmark results for the validation Full set of parameters of the IM-EA algorithm (Table  S1) and different contributions to the relative energies and rotational constants of low-energy conformers for glycine (Table S2), alanine (Table S3), serine (Tables  S4 and S5), threonine (Tables S6 and S7), cysteine (Tables S8 and S9), aspartic acid (Tables S10 and S11) and asparagine (Tables S12 and S13) (PDF)